..
  ::
  module language.core-language where

.. _core-language:

*************
Core language
*************

A program in Agda consists of a number of declarations written in an ``*.agda``
file. A declaration introduces a new identifier and gives its type and
definition. It is possible to declare:

* :ref:`datatypes <data-types>`
* :ref:`record types <record-types>` (including
  :ref:`coinductive records <copatterns-coinductive-records>`)
* :ref:`function definitions <function-definitions>`
  (including :ref:`mixfix operators <mixfix-operators>` and
  :ref:`abstract definitions <abstract-definitions>`)
* :ref:`modules <module-basics>`
* local definitions :ref:`let <let-expressions>` and
  :ref:`where <where-blocks>`
* :ref:`postulates <postulates>`
* :ref:`variables <generalization-of-declared-variables>`
* :ref:`pattern-synonyms <pattern-synonyms>`
* :ref:`precedence <precedence>` (fixity)
* :ref:`pragmas <pragmas>`, and
* :ref:`program options <command-line-options>`

Declarations have a signature part and a definition part. These can appear
separately in the program. Names must be declared before they are used, but
by separating the signature from the definition it is possible to define things
in :ref:`mutual recursion <mutual-recursion>`.

Grammar
-------

At its core, Agda is a dependently typed lambda calculus. The grammar of
terms where ``a`` represents a generic term is:

.. code-block:: text

  a ::= x                       -- variable
      | λ x → a                 -- lambda abstraction
      | f                       -- defined function
      | (x : a) → a             -- function space
      | F                       -- data/record type
      | c a                     -- data/record constructor
      | s                       -- sort Seti, Setω+i


Syntax overview
---------------

The syntax of an Agda program is defined in terms of three key components:

* **Expressions** write function bodies and types.
* **Declarations** declare types, data-types, postulates, records, functions etc.
* **Pragmas** define program options.

There are also three main levels of syntax, corresponding to different levels
of interpretation:

* **Concrete** is the high-level sugared syntax, it representing exactly what
  the user wrote (Agda.Syntax.Concrete).
* **Abstract**, before typechecking (Agda.Syntax.Abstract)
* **Internal**, the full-intepreted core Agda terms, typechecked; roughly
  corresponding to (Agda.Syntax.Internal).

The process of translating an ``*.agda`` file into an executable has several
stages:

.. code-block:: text

  *.agda file
     ==[ parser (Lexer.x + Parser.y) ]==>
  Concrete syntax
     ==[ nicifier (Syntax.Concrete.Definitions) ]==>
  'Nice' concrete syntax
     ==[ scope checking (Syntax.Translation.ConcreteToAbstract) ]==>
  Abstract syntax
     ==[ type checking (TypeChecking.Rules.*) ]==>
  Internal syntax
     ==[ Agda.Compiler.ToTreeless ]==>
  Treeless syntax
     ==[ different backends (Compiler.MAlonzo.*, Compiler.JS.*, ...) ]==>
  Source code
     ==[ different compilers (GHC compiler, ...) ]==>
  Executable

The following sections describe these stages in more detail:

Lexer
-----

.. _Alex: http://www.haskell.org/alex

Lexical analysis (aka tokenization) is the process of converting a sequence of
characters (the raw ``*.agda`` file) into a sequence of tokens (strings with a
meaning).

The lexer in Agda is generated by Alex_, and is an adaptation of GHC's lexer.
The main lexing function ``lexer`` is called by the
``Agda.Syntax.Parser.Parser`` to get the next token from the input.

Parser
------

.. _Happy: http://www.haskell.org/happy

The parser is the component that takes the output of the lexer and builds a
data structure that we will call Concrete Syntax, while checking for correct
syntax.

The parser is generated by Happy_.

Example: when a name is a sequence of parts, the lexer just sees it as a
string, the parser does the translation in this step.


Concrete Syntax
---------------

The concrete syntax is a raw representation of the program text without any
desugaring at all.  This is what the parser produces. The idea is that if we
figure out how to keep the concrete syntax around, it can be printed exactly
as the user wrote it.

Nice Concrete Syntax
--------------------

The ``Nice Concrete Syntax`` is a slightly reorganized version of the
``Concrete Syntax`` that is easier to deal with internally. Among other
things, it:

* detects mutual blocks
* assembles :ref:`definitions <module-basics>` from their isolated parts
* collects fixity information of :ref:`mixfix operators <mixfix-operators>`
  and attaches it to definitions
* emits warnings for possibly unintended but still valid declarations, which
  essentially is dead code such as empty ``instance`` blocks and misplaced
  :ref:`pragmas <pragmas>`

Abstract Syntax
---------------

The translation from ``Agda.Syntax.Concrete`` to ``Agda.Syntax.Abstract``
involves scope analysis, figuring out infix operator precedences and tidying
up definitions.

The abstract syntax ``Agda.Syntax.Abstract`` is the result after desugaring
and scope analysis of the concrete syntax. The type checker works on abstract
syntax, producing internal syntax.

Internal Syntax
---------------

This is the final stage of syntax before being handed off to one of the
backends. Terms are well-scoped and well-typed.

While producing the ``Internal Syntax``, terms are checked for safety. This
safety check means :ref:`termination check <termination-checking>` and
coverage check for functions, and :ref:`positivity check <positivity-checking>`
for datatypes.

Type-directed operations such as
:ref:`instance resolution <instance-resolution>` and disambiguation of
overloaded constructors (different constructors with the same name) also
happen here.

The internal syntax ``Agda.Syntax.Internal`` uses the following haskell
datatype to represent the grammar of a ``Term`` presented above.

.. code-block:: haskell

  data Term = Var {-# UNPACK #-} !Int Elims -- ^ @x es@ neutral
          | Lam ArgInfo (Abs Term)        -- ^ Terms are beta normal. Relevance is ignored
          | Lit Literal
          | Def QName Elims               -- ^ @f es@, possibly a delta/iota-redex
          | Con ConHead ConInfo Elims
          -- ^ @c es@ or @record { fs = es }@
          --   @es@ allows only Apply and IApply eliminations,
          --   and IApply only for data constructors.
          | Pi (Dom Type) (Abs Type)      -- ^ dependent or non-dependent function space
          | Sort Sort
          | Level Level
          | MetaV {-# UNPACK #-} !MetaId Elims

Treeless Syntax
---------------

The treeless syntax is intended to be used as input for the
:ref:`compiler backends <compiler-backends>`. It is more low-level than the
internal syntax and is not used for type checking. Some of the features of
the treeless syntax are:

* case expressions instead of case trees
* no instantiated datatypes / constructors

For instance, the :ref:`Glasgow Haskell Compiler (GHC) backend <ghc-backend>`
translates the treeless syntax into a proper GHC Haskell program.

Another backend that may be used is the
:ref:`JavaScript backend <javascript-backend>`, which translates the treeless
syntax to JavaScript code.

.. _a-normal-form:

The treeless representation of the program has `A-normal form
<https://en.wikipedia.org/wiki/A-normal_form>`_ (ANF). That means that all the
case expressions are targeting a *single* variable, and all alternatives may
only peel off one constructor.

The backends can handle an ANF syntax easier than a syntax of a language where
one may case arbitrary expressions and use
:ref:`deep patterns <with-abstraction>`.
